Name pronunciation in German text-to-speech synthesis

نویسندگان

  • Stefanie Jannedy
  • Bernd Möbius
چکیده

We describe the name analysis and pronunciation component in the German version of the Bell Labs multilingual text-tospeech system. We concentrate on street names because they encompass interesting aspects of geographical and personal names. The system was implemented in the framework of finite-state transducer technology, using linguistic criteria as well as frequency distributions derived from a database. In evaluation experiments, we compared the performances of the generalpurpose text analysis and the name-specific system on training and test materials. The name-specific system significantly outperforms the generic system. The error rates compare favorably with results reported in the research literature. Finally, we discuss areas for future work. 1 I n t r o d u c t i o n The correct pronunciation of names is one of the biggest challenges for text-to-speech (TTS) conversion systems. At the same time, many current or envisioned applications, such as reverse directory systems, automated operator services, catalog ordering or navigation systems, to name just a few, crucially depend upon an accurate and intelligible pronunciation of names. Besides these specific applications, any kind of well-formed text input to a generalpurpose TTS system is extremely likely to contain names, and the system has to be well equipped to process these names. This requirement was the main motivation to develop a name analysis and pronunciation component for the German version of the Bell Labs multilingual text-to-speech system (GerTTS) (M6bius et al., 1996). Names are conventionally categorized into personal names (first and surnames), geographical names (place, city and street names), and brand names (organization, company and product names). 4 9 In this paper, we concentrate on street names because they encompass interesting aspects of geographical as well as of personal names. Linguistic descriptions and criteria as well as statistical considerations, in the sense of frequency distributions derived from a large database, were used in the construction of the name analysis component. The system was implemented in the framework of finite-state transducer (FST) technology (see (Sproat, 1992) for a discussion focussing on morphology). For evaluation purposes, we compared the performances of the generM-purpose text analysis and the name-specific systems on training and test materials. As of now, we have neither a t tempted to determine the etymological or ethnic origin of names, nor have we addressed the problem of detecting names in arbitrary text. However, due to the integration of the name component into the general text analysis system of GerTTS, the latter problem has a reasonable solution. 2 S o m e p r o b l e m s in n a m e a n a l y s i s What makes name pronunciation difficult, or special, in comparison to words that are considered as regular entries in the lexicon of a given language? Various reasons are given in the research literature (Carlson, GranstrSm, and LindstrSm, 1989; Macchi and Spiegel, 1990; Vitale, 1991; van Coile, Leys, and Mortier, 1992; Coker, Church, and Liberman, 1990; Belhoula, 1993): • Names can be of very diverse etymological origin and can surface in another language without undergoing the slow linguistic process of assimilation to the phonological system of the new language. • The number of distinct names tends to be very large: For English, a typical unabridged collegiate dictionary lists about 250,000 word types, whereas a list of surnames compiled from an address database contains 1.5 million types (72 million tokens) (Coker, Church, and Liberman, 1990). It is reasonable to assume similar ratios for German, although no precise numbers are currently available. • There is no exhaustive list of names; and in German and some related Germanic languages, street names in particular are usually constructed like compounds (Rheins~ra~e, Kennedyallee) which makes decomposition both practical and necessary. • Name pronunciation is known to be idiosyncratic; there are many pronunciations contradicting common phonological patterns, as well as alternative pronunciations for certain grapheme strings. • In many languages, general-purpose graphemeto-phoneme rules are to a significant extent inappropriate for names (Macchi and Spiegel, 1990; Vitale, 1991). • Names are not equally amenable to morphological processes, such as word formation and derivation or to morphological decomposition, as regular words are. Tha t does not render such an approach unfeasible, though, as we show in this paper. • The large number of different names together with a restricted morphological structure leads to a coverage problem: It is known that a relatively small number of high-frequency words can cover a high percentage of word tokens in arbi trary text; the ratio is far less favorable for names (Carlson, GranstrSm, and LindstrSm, 1989; van Coile, Leys, and Mortier, 1992). We will now illustrate some of the idiosyncracies and peculiarities of names that the analysis has to cope with. Let us first consider morphological issues. Some German street names can be morphologically and lexically analyzed, such as Kurfiivst-en-damm ('electorial prince dam'), Kirche-nweg ('church path') . Many, however, are not decomposable, such as Henmerich ( '?') or Rimparstra~e ( '?Rimpar street'), at least not beyond obvious and unproblematic components (Stra~e, Weg, Platz, etc.). Even more serious problems arise on the phonological level. As indicated above, general-purpose pronunciation rules often do not apply to names. For instance, the grapheme in an open stressed syllable is usually pronouned [e:]; however, in many first names (Stefan, Melanie) it is pronounced [e]. Or consider the word-final grapheme string in Batterie [bat~r'i:] 'bat tery ' , Materie [mat'e:ri~] 'mat ter ' , and the name Rosemarie [r'o:zomari:]. And word-final : Mus [m'u:s] 'mush, j am ' vs. Erasmus [er'asmus]. A more special and yet typical example: In regular German words the morphemeinitial substring as in chemisch is pronounced [§e:m], whereas in the name of the city Chemnilz it is pronounced [kcm]. 5 0 Generally speaking, nothing ensures correct pronunciation better than a direct hit in a pronunciation dictionary. However, for the reasons detailed above this approach is not feasible for names. In short, we are not dealing with a memory or storage problem but with the requirement to be able to approximately correctly analyze unseen orthographic strings. We therefore decided to use a weighted finite-state transducer machinery, which is the technological framework for the text analysis components of the Bell Labs multilingual TTS system. FST technology enables the dynamic combination and recombination of lexical and morphological substrings, which cannot be achieved by a static pronunciation dictionary. We will now describe the procedure of collecting lexically or morphologically meaningful graphemic substrings that are used productively in name formation. 3 P r o d u c t i v e n a m e c o m p o n e n t s

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The bell labs German text-to-speech system: an overview

In this paper we present an overview of the German version of the Bell Labs text-to-speech system, a high-quality concatenative synthesis system with extensive text analysis capabilities. We discuss problems of text analysis, and our solutions to these problems, including: the integration of text normalization tasks into linguistic text analysis; the capability to morphologically analyze compou...

متن کامل

A German viseme-set for automatic transcription of input text used for audio-visual speech synthesis

In this paper, we introduce a German viseme inventory for visemically transcribing text according to phonetic transcribtion. A viseme set like the one presented in this work is essential for speech-driven audio-visual synthesis due to the fact that the selection of appropriate video segments is based on the visemically transcribed input text. For text-to-speech synthesis, a transcription of the...

متن کامل

Word and syllable models for German text-to-speech synthesis

The correct pronunciation of unknown or novel words is one of the biggest challenges for text-to-speech systems. In this paper we describe the implementation of unknown word analysis as a central component of the text analysis module in the Bell Labs German text-to-speech system. The implementation is based on a model of the morphological structure of words and on the study of the productivity ...

متن کامل

The AT&t German text-to-speech system: realistic linguistic description

Like many current TTS systems the AT&T German text -tospeech system is based on the methods of unit selection and concatenative synthesis [1]. This paper highlights efforts to improve TTS quality by closely matching the speakers' original productions with linguistic descriptions. On the segmental level this is achieved by adjusting the speakers' individual productions to an established, general...

متن کامل

Speech Synthesis and Standard Pronunciation of German

The Institute of Speech Sciences and Phonetics at Halle University is currently working on a new dictionary of German pronunciation, including the development of a “talking dictionary”. This development is done in cooperation with the Laboratory of Acoustics and Speech Communication at Dresden University of Technology. For this purpose a preliminary study was conducted, using a high-quality spe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997